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ConfiniQtX)ry Analysis of Test Structure 
usirvg Multidimensional Item Response Iheory 



Abstract 

Hie purpose of tliis research was to develop and evaluate a confirmatory 
approach to assessing test structure using multidimensional item response 
theory (MIRT) . Hie approach investigated involves adding to the exponent of 
the MERT model an item structure mtrix that allows the user to specify the 
ability dimensions measured by an item. Various ccmbinations of item 
structures were fit to two sets of simulation data with known true structures, 
and the results were evaluated using a likelihood ratio chi-square statistic 
aoid two information— based model selection criteria. Hie results of these 
analyses support the use of the confincatory MERT approach, since it was found 
that the procedures could recover the true item structures. It was also found 
that adding an additional ability'' dimension that forces together items that 
ou^t not to be together noticeably deteriorates the quality of the solution. 
On the other hand, iitposing structures different frcsn, tut not inconsistent 
with, the true structures does not necessarily yield worse fit. Finally, iri 
terms of model fit statistics, the consistent Akaike information criterion 
pjerformed better than the simple Akaike information criterion, while the 
likelihood ratio chi-square was clearly inadequate. 
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Confirmatory Analysis of Test Structure 
using Multidimensional Item Response Theory 

Introduction 

Althouf^ item response theory (IRT) has proven to be a very powerful and 
nop-Fni itveasurement tool, use of IRT models has been scsmevhat limited because 
the available models require the assumption that the test being analyzed 
measures only a single ability dimension. This larddimensiorvality assumption 
often limits the ^plication of IRT-based methods to tests consisting of 
relatively homogeneous sets of items, such as might be found on a vocabulary 
test. Tests including items ;>aiipled from several content areas, such as a 
science best containing both physics and chemistry items, are probably not 
sufficiently homogeneous cis to permit analysis using IRT. Such may also be 
the case with tests containing multi-faceted items, such as a mathematics test 
containing problem-solving items requiring a hi<^ level of reading 
comprehension or vocabulary skill. 

Because it is clear that many tests iiveasure more than a single ability 
dimension (Traub, 1983) , atterrpts have been made to extend IRT to 
multidimensional tests. In multidimensional IRT, or MIRT, examinee responses 
are modeled as a function of a set of examinee traits, and the assunption of 
unidimensionality is replaced by the less restric±.ive requirement that the 
dimensionality of the item responses matches the dimensionality of the set of 
examinee traits used in the MIRT model. 
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This assunption is less restrictive in that it permits the application of 
IRT methods to a much broader range of instruments, but it has its cwn 
disadvantages. The most serious difficulty encountered in the application of 
MIRr methodology is matching the model dimensionality to the dimensionality of 
the test. This problem is particularly serious in view of the fact tlv'.t there 
are no generally accosted procedures for determining the dimensionality of a 
test, especially if the test items are dichotcxnously scored. In addition, in 
the case of correlated dimensions, a number of multidimensional solutions 
mi^t fit the data almcst equally well. If this is so, it may be desirable to 
choose a solution with a theoretical foundation in cognitive psychology or 
test content rather than a solution with a slightly greater likelihood, 
perhaps obtained by fitting error. 

The purpose of this research was to develop and evaluate mxiltidimensional 
IRT procedures designed to permit the extraction of theory-based solutions. 

The procedures developed include a MEET model, a mechanism for iirposing a 
priori structures on the data, procedures for estimating the model parameters, 
and model selection criteria for choosing among alternative structures. The 
evaluation performed on these procedures inclxxied asse^ing the reasonalileness 
of simulation data generated to fit the model, evaluating the estimation 
procedures, and evaluation of the model selection criteria. 
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Background 

Moltidiitvensional IRT 

Most of the recent progress in M-LKT resea r ch has occurred in two areas 
the develop .ftent of nonlinear factor analysis models (Bock, Gibbons, & Muraki, 
1985; Christoffersson, 1975; and Muthen, 1978) , and the develc^ment of 
multidimensional two- and three-parameter logistic MIRT models (McKinley, 

1983, 1987, in pr^)aration; Reckase, 1935; Reckase, Ackerman, & Carlson, 1988; 
Reckase & McKinley, 1985) . Work on these procedures is still at an early 
stage, but has progressed to the point that estimation procedures are 
available. 

^onliTv^?<T factor analysis model . Ihe basic nonline ar factor analysis 
model is based on the two-parameter logistic normal ogive (2FN0) model, 
althou^ the method does allcw for the input of previously estimated 
c— parameter values. The 2PN0 model ■"'‘^sumes there is an unobservable response 
variable which is on a continuous scale, and viiich is dichotomized into an 
observed score of 0 or 1 depending on vAiether the examinee is above or below 
sane threshold point. The 2PN0 model is given by 

P.(©.) = ^((Y.-a. 'e,)/s.) , (1) 

13 1131 

vAiere P (9 ) is the probability of a correct response to item i by examinee 

i j 

$(x) represents the cumulative normal distribution, a^ is a column vector of k 
item factor loadings j, 0^ is a column vector of k factor scores for examinee 
j , s is the standaird deviation of the normal distribution, Y represents the 
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threshold beyond vAiich an examinee will respond correctly to the item, and 
there are k dimensions. 

Applications of nonlinear factor analysis procedures include analysis of 
the item pool for the ccmputerized version of the Armed Services Vocational 
Aptitude Battery, or ASVAB, by Zimcwski and Bock (1987) , analysis of the 
Graduate Management Admission Test, or G by Kingston (1986) , and analysis 
of the Graduate Records Examination (GRE) Subject Test in Mathematics by 
^fc^Kinley and Kingston (1987) . 

Logistic models . In the logistic model approach, the normal ogive model 
is replace with the logistic distribution. Although the parallel to factor 
analysis is lost in this approach, all of the desirable properties of IRT are 
maintained, emd the ccnputation is greatly simplified. The multidimensional 
three-parameter logistic, or M3PL, model is given by 



P. (i.) = c. + 

1 J 1 



(1-c^) / (14exp(- 



•1.702 (b. 



+ a. 'ey)) 



( 2 ) 



vhere P^(0^) is the probability of a correct response to item i by examinee j, 
0^ is a column vector of k ability parameters for examinee j, a. is a column 
vector of k discrimination parameters for item i, b. is the threshold 



1 

parameter for item i, and c^ is the Icwer asymptote parameter for item i. 
There are k dimensions, and the ability and discrimination parameter vectors 
contain one element for each dimension. 

Uses of logistic M1RL‘ models thus far have been scame^^hat limited, 
primarily due to the recency of the develcptnent of estimation procedures for 
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allying the models. Hie multidimensional two-parameter logistic (M2PL) 
model, which is formed by holding the M3PL c-parameter fixed at zero for all 
items, was applied to a French proficiency exam by Kaya-Carton (1988) . In 
this ajplication, pcirameters of the M2PL model were obtained using the 
M3LTIDIM (McKinley, 1987) progiram with item lower asymptote parameters fixed 
at zero. Hie method was ccttpared to maximum likelihood factor analysis and 
boolean factor analysis. Despite the shortness of the test (18 items) and the 
small saxiple size (between 700 and 800 examinees) , the results obtained were 
positive. Hie MIRT solution was found to be interpretable and consistent with 
the factor analysis solutions. 

Estimation . For the nonlinear factor analysis procedure, the TESTFACT 
program (Wilson, Wood, & Gibbons, 1984) is available. Hie TESTFACT program 
iieuag niarginal maximum likelihood estimation (MMLE) to estimate the item 
parameters of the 2PN0 factor analysis model, and provides a mechanism for 
using item guessing parameters obtained frcm a previous analysis. In this 
approach to estimation, examinee factor scores are treated as nuisance 
parameters, and are removed from the estimation process by specifying a 
distribution for them, and integrating over that distribution. 

For the M2PL model, the MAXLDG (McKinley & Reckase, 1983) and MIETE 
(Carlson, 1987) programs are available, and for the M2PL and M3PL models the 
MULTTDIM program (McKinley, 1987) is available. The MAXLDG and Mitd'K programs 
are based on a simultaneous, or joint, maximum likelih.ood estimation (MLE) 
algorithm. In MLE item parameters are estimated while ability parameters are 
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held fixed, and then ability parameters are estimated while item parameters 
are held fixed. Ihis two-st^ prcx^ess is reseated ijntil the proceiure 
cxinverges. The MUEJIIDIM program uses MMLE. 

Model selection . In MIRT, model selection is relatively straightforward. 
Soluticais for relatively siiuple models (such as the unidimensional 3PL model) 
are obtained fir^. Models of increasing cxxrplexity are then created by- 
adding parameters. Ihese more carplex models subsume sinpler models, making 
it possible to "test -the significance of the contribution of the additional 
parameters using procedures such as are implemented in TESTFACT. 

For example, assume one- and -two-dimensional MIRT solutions have been 
obtained on the same data using the M2PL model. Conparing the solutions can 
be acccitplished by ccitputing, for each solution, a measure of fit such as -the 
likelihood ratio chi-square statistic (Bock, Gibbons & Muraki, 1985) . Ihis 
statistic is given by 



J 

g 2 = 2 S r-j ln(r-j/NPj) , (3) 

j=l 

vhere J is -the number of possible unique response strings for -the i-tem set to 
be calibrated, r. is the number of examinees with response s-tring j, N is the 
-total number of examinees in -the calibration sairple, and P. is cotputed as 



q 

S 

1^1 



Ljtek)Wk 



( 4 ) 
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viiere P r^resents the total li3celihood of observing response string j, 

j 

L (X ) is the likelihood of observing response string j given an ability 

j 

vector equal to and and are the quadrature nodes and weights used 
for numerically integrating over the ability distribution. 

The degr «^ of freedom for the statistic given by Equation 3 are given by 



n 



df = 2 -2(m+2) 



(5) 



where n is the number of items and m is the number of dimensions. If the 

c~parameter is not estimated the term in parentheses is m+1. 

While it is doubtful the statistic given by Equation 3 is actually 

2 

distributed as a chi-square, the difference between the values of G for 

subsuming models co^t to be distributed as a chi-square (Haberman, 1977) . 

2 . 

The degrees of freedom for the difference between two values of G is equal to 
the difference between Equation 5 for the two solutions. For MIRT models, 
this equals the difference in the number of item parameters estimated. 

KEPT Limitations 

Logistic and factor analytic KEPT procedures share a very serious 
shortcoming — they are prone to overfitting the data. This occurs in part 
because these procedures depend on large sanple chi-square tests to assess the 
significance of the contribution of additional eibility dimensions. This 
results in a very powerful test that often results in retention of 
statistically significance, yet uninterpretable dimensions, perhaps based on 
nothing more than chance relations in the data. 



Confirmatory MIRT 



Confimatory MIRT was developed largely as a response to this shortccsning 
in MIRT. The goals of confirmatory MIRT, then, are to avoid overfitting the 
data and to enhance the interpretability of obtained solutions by forcing a 
correspondence between estimated ability dimensions and the content and 
cognitive processes the instrunvent was intended to measure. This is 
accotplished by inserting into the MXRT model ein item stru-cture matrix that 
determines for a given item \«hich ability dimensions cire measured. Itexns that 
in theory ought to measure the same ability dimensions are thereby clustered 
together ty assigning them identical structure matrices. Similarly, iteiTis 
that in theory ought to differ as to \^ch dimensions are measured are forced 
apart by assigning them dissimilar structure matrices. Thus, ability 
dimensions are defined prior to estimation based on a priori considerations. 

Confirmatory MIRT model . The confirmatory MIRT, or CMTRr, procedure used 
in this resecirch is based on a modification of the M3PL model. As indicated 
above, the modification consists of adding am item structure matrix. The 
confirmatory M3PL, or QCPL, model is given by 



P^(0 J = c + (l-c^)/(l+e:q3(-1.702(b^ + a^'S^gJ)) 



( 6 ) 



where is the item structure matrix for item i, and the remaining terms are 
as previously defined. 

Item stricture matrices . The item structure matrix identifies for a 
given item the ability dimensions measured. This is accotplished by 
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^ 5 ecifying eitiier a 0 or a 1 for each eleanent of S in such a way that, when it 
is preinultiplied by the transpose of the item discrimiiiation vector, item 
discrimination parameters are zeroed out for those ability dimensions the item 
does not i.ieasure. For example, if S is the identity matrix, the item measures 
all of the ahi 1 it-y dimensions. If an item is to measure only the first arxi 
third dimensions in a three-dimensional solution, S wouiu be given by 



S 



10 0 
0 0 0 
0 0 1 



( 7 ) 



R<?^^Tnation . Ihe estimation procedure used in this study was the CDNFIFM 
program, which is based on an EM algorithm simular to those described ty Bock 
ard Aitkin (1981), Bock, Gibbons, ard Muraki (1985), Mislevy and Bock (1985), 
and Reckase and McKinley (1985) . The algorithm has been modified for this 
applicaticxi to allow collapsing across extraneous ability dimensions in 
accordance with the hypothesized item structures. 

In this algorithm, item parameter estimation is performed using a 
two-step marginal maximum likelihood procedure. In this procedure, examinee 
cibility is treated as a randcsn variable, and is eliminated fruSti the estimation 
process by specifying a form for the ability distribution and integrating over 
that distribution. The integration over the ability (..j.stritution, vhich is 
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acxxsrplished throu^ numerical quadrature, is performed during the first step, 
the E-st^, and produces ein expected sairple size and number-correct score at 
each quadrature node for each item. 

In MIRT applications, the values produced by the E-step are immediately 
input into the M-st^. In OORr applications, however, there is an 
intermediate step performed prior to execution of the M-st^. This additional 
step involves a process that collapses expected sanple size and number-correct 
scores over ability dimensions not measured by an item. The output of this 
step, then, is the expected saitple size and number-correct scores for each 
item collapsed in accordance with the specified item structures. 

These values are then input into the M-step, vhich uses these values to 
perform marginal maximum likelihood item parameter estimation. This process 
is repeated until the item parameter estimates converge. As a final, cptional 
st^, the item parameter estimates can be used as input into an expected a 
posteriori . or EAP, ability estimation routine. For a coirplete description of 
the CXXTFIKl program, see MeKinley (in pr«^>aration) . 

Model selection . Clearly many different sets of S matrices can be 
applied to a given set of data. Consequently, a procedure for selecting from 
among them is necessary. Ifrifortunately, the chi-square procedure described 
above often cannot be applied, since in aiERT alternative models are not 
necessarily subsuming. For example, consider a four item test in vhich the 
first two items are intended to measure vocabulary, and the last two items are 
intended to measure reading caiprehension. One test structure tliat mi^t be 
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evaluated using CMIET involves fitting one conmon ability dimension that all 
fc«r itens are assunved to measure, and crie add i tional dimension that only the 
Icist two items are assumed to measure. 

An alternative test structure that mi<^t be considered would be to assume 
all four itens measure a cctntnon dimension, and the middle two items measure a 
second dimension. Ihis structure would not correspond to any content —based 
hypothesis about the structure of the test, tut would serve as a useful 
baseline for evaluating the first model. That is, it v/ould provide an 
jjjdication of the extent of improvement (or deterioration, as the case may be) 
in the quality of the solution to be expected simply froti adding a second 
dimension for two items. 

Unfortunately, the model selection procedure described above cani-Kot be 
applied in this case. Not only are the carpeting models not subsuming, but 
they result in equal degrees of freedom. Although the resulting chi-squares 
could still be visually cotpared, the significance of the difference in the 
chi-squares could not be tested. 

One way these two ccttpeting models of test structure mi<^t be conpared is 
based on the work of Akaike (1973, 1987) . This approach is based on a 
criterion called the entrcpic information criterion (Bozdcgan, 1987) , also 
known as the AIC, and involves evaluating model fit in terms of the natural 
logarithm of the likelihood of the solution, vhich is presumed to be an 
approximation of the expected natural logarithm of the likelihood of the true 
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model. The greater the likelihood of the solution (in practice, the lower the 
negative log likelihood) , the closer the fitted model is presumed to 
approximate the true model. 

The AIC statistic is given by 

MC = -2 log(L) + 2k / (8) 

vihere log(L) denotes the natural logarithm of the likelihood, and k is the 
ramiber of parameters estimated. The 2k term constitutes a sort of a penalty 
function that penalizes over-parameterization. 

A variation on the AIC, called the consistent AIC (CAIC) , was proposed by 
Bozdogan (1987) . This statistic was derived in response to tlie criticism that 
the AIC statistic does not provide an asyirptotically consistent estimate of 
model order (Bozdogan, 1987) . 

The CAIC statistic is given by 

CAIC = -2 log(L) + k(log(n)+l) , (9) 

vhere n is the saiiple size. Ttiis modification of the AIC hcis the effect of 
increasing the penalty for over-parameterization eind, consequently, teixis to 
lead to the selection of simpler models. 

One reason these statistics are desirable is tliat they are designed to 
identify vhich of a clciss of models is the closest approximation to the true 
model. Unlike classical chi-square tests of model fit, in vhich a constrained 
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r rmrifai is t^ically evaluated by ccfiparing' it to a more saturated, subsuming 
model, with the MC ar^d CMC statistics each model under consideration is 
evaluated in terms of its closeness to the tnae mcdel. Different models are 
not directly corrparad, so tinere is no requirertont tiiat ccrapeting models be 
subsuming. 

The MC and CMC statistics have a very int-eresting prc^>erty that makes 
them very desirable even in situations where use of the chx-square statistic 
is possible — the level of significance for testing whether a particular 
model is the best-fitting model is implicit to the model-selection criterion 
(Bozdogan, 1987) . In effect, the critical value is embodied in the penalty 
for over-parameterization, and the probability of a Type I error is determined 
by the sample size. Tlius, selecting the CMC over the MC is tantamount to 
selecting a larger critical value, vhich results in a reduction in the Type I 
error rate. Moreover, once a statistic has been selected, increasing the 
sample size has the effect of decreasing the probability of a Type I error. 
Inde^, the Type I error rate decreases exponentially with increased sample 
size. In fact, for the CMC statistic, the error rate asymptotically goes to 
zero. 



Method 

Overview 

Ihe evaluation of the CMEE?! procedure described above was perforitied using 
simulation data. Two sets of simulation data were generated using different 
true structure matrices. Several different solutions, based on different 
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structure matrices, were then obtained and ccnpared in terms of model fit 
using the indices described above. 

Although the mciel proposed above was a miltidimensional extension of the 
unidimensional 3PL model, for the purpose of evaluating tlie proposed CKCRT 
procedures the lower asynptote parameter, c, was held constant. Ihis was done 
to avoid corplications arising from errors in estimation of the c-parameter. 

At this point it is unclear how such errors in estimation affect the solution 
obtained, but experience with unidimensional IRT estimation procedures suggest 
the effect is potenti-xly serious. 

E)ata Generation 

As was indicated above, two sets of data were used for this evaluation. 

As was pointed out previously, for both sets of data true c-parameters were 
not allowed to vary. Rather, a constant value of 0.15 was lased. For each 
simulation dataset responses were generated for 1000 examinees and 80 items. 

Ihe first set of simulation data was generated to have only one ability 
dimension. For these data examinee true abilities were selected randcsnly from 
a standard normal distribution. Ihe same iten structure xnatrix was used for 
all items. Ihe matrix lased was, in effect, a scalar with a value of 1. 

The second set vas generated using three uncorrelated ability dimensions. 
Examinee true abilities were selected fran a trivariate normal distribution 
with a mean vector equal to zero and a covariance matrix equal to the identity 
matrix. Two different item structure matrices were used to generate these 
data. The first matrix, used for the first 40 items, is given by 



lb 
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S = 



1 

0 

0 



0 

1 

0 



0 

0 

0 



( 10 ) 



Ihus, these items each measured the first two ability dimensions. Ihe second 
item structure matrix, used for items 41 througli 80, is given by 



S = 



1 

0 

0 



0 

0 

0 



0 

0 

1 



( 11 ) 



These items each mecisured the first and third ability dimensions. 

Table 1 provides summary statistics for each model parameter used to 
generate each dataset. Note that for the three-dimensional data, there were 
only 40 item discrimination values on the second and third dimensions. The 
remaining 40 values were set equal to zero. 

For the imadimensional data, the correlation between the true a-values 
and b-values was 0.01. For the three-dimensional data, the correlation 
between the a— values was -0.14 for the first and second dimensions, 0.21 for 
the first and third dimensions, and 0.0 for the second and third dimensions 
(note that ix 5 item had a-values for both the second and the third dimensions) . 
The b-values for the three-dimensional data had correlations of -0.09, 0.23, 
and 0.17 with the a-values on the first, second and third dimensions, 
respectively. For the three-dimensional data, the correlations of the true 
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ability pairameters were 0.0 for the first and second dimensions and for the 
first and third dimensions, and -0.02 for the second and third dimensions. 



Table 1 



True Parameter Distribution Summary Statistics 



Dataset/Parameter 


N 


Mean 


Std. Dev. 


Min. 


Max. 


1-Dimensional 


a 


80 


l.GO 


0.29 


0.52 


1.48 


b 


80 


-o.:l4 


0.93 


-2.28 


1.83 


c 


80 


0.15 


0.00 


0.15 


0.15 


0 


1000 


—0.05 


0.93 


-2.85 


2.77 


3 -Dimensional 


^1 


80 


1.01 


0.28 


0.53 


1.48 


^2 


40 


1.00 


0.26 


0.53 


1.43 


^3 


40 


0.92 


0.29 


0.53 


1.48 


b 


80 


0.01 


0.93 


-3.34 


2.28 


c 


80 


0.15 


0.00 


0.15 


0.15 


®1 


1000 


-0.05 


0.99 


-4.03 


3.76 


®2 


1000 


-0.01 


0.99 


-3.13 


2.83 


®3 


1000 


0.00 


0.99 


-3.12 


3.70 



Solutions 

Severax different solutions were obtained for each set of data. For both 
sets of data, the first solution obtained was unidimensional, and used for 
each item an item structure matrix that was, in effect, a scalar with a value 
of 1. For the xmidimensional data, this solution, signified by the code IDU 
(one-dimensional unconstrained), represents the true structure of the data. 

Ihe second solution obtained for each set of data was two-dimensional, 
with each item measuring both dimensions. This was acccstplished by using for 
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all itens an item structure itatrix equal to a two-ky-two identity itatrix. liiis 
solution is signified by the cxide 2DU (two-dimensional unconstrained) . 

The third matrix ney>ri for both sets of data was also two-dimensional. 
However; for this solution, designated 2DC (two-dimensional constrained) , the 
two-by-two identity matrix was used only for items 41 throu^ 80. For items 1 
throcK^ 40, the xcatrix used is given ty 



S = 



1 

0 



0 

0 



(12) 



For the three-dimensional simulation data, two additional solutions were 
obtained. Ihe first of these, designated as 3DCa (three-dimensional 
constrained, solution a) , tised for each item the matrix used to generate the 
data (given by Equations 10 and 11) . Ihe other solution, designated 3DCb 
(three-dimensional constrained, solution b) , lased for items 1 throu^ 20 and 
items 61 throu^ 80 the matrix given by Equation 10, and for items 21 through 
60 the xcQtrix given by Equation 11. Ihus, the first and last 20 items 
measured the first two eibility dimensions, while the middle 40 items measured 
the first and third ability dimensions. 

For all solutions derived during this evaluation, estimates of the 
c-parameter were held fixed at their true value, 0.15. All other item 
parameters were estimated, with the restriction that a maximum value of 1.8 
and a minimum value of 0.1 was imposed on the a-values. Ihe estimation 
procedure was allowed to cycle throu^ the EM process until the likelihood of 



O 

ERIC 



18 



the response matrix, given the CKERT model and current item parameter 
estimates, ceased to increcise, or until the item parameter estimates ceased to 
change, vAiichever came first. 

Analyses 

For each set of data the fol lowing analyses were performed. First, a 
principal ccnponents analysis of phi coefficients and a traditional item 
analysis were performed to evaluate the reasonableness of the generated data, 
and to provide information to aid in the interpretation of the results of the 
CMIRr analyses. (It should be noted that tetrachoric correlations were also 
ccjtputed for both sets of data, but in both cases the correlation matrix was 
found to be non-Gramian.) 

Second, several different CMERT solutions were obtained for each set of 
data. Then, these solutions, which varied not only in dimensionality, but 
also in the item stru ture matrices used, were evaluated using the AIC and 
CAIC itxsdel selection cu’iteria, as well as the likelihood ratio chi-square 
goodness-of-fit statistic. 

In addition, as an aid to interpretation correlations between the true 
parameters and the parameter estimates obtained for each solution were 
coitputed. Of course, these correlations cannot be used to evaluate the 
quality of the solutions, since in some cases estimates are obtained for 
parameters not even used in the data generation. Moreover, it is pxossible 
that some solutions with estimates that have lew correlations with the true 
paraitaters may, in fact, represent rotations and/or translations of the true 
structures i:ised to generate the data. 
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Finally, residuals were confuted and analyzed, using the prxxzedure 
described by Divgi (1980) , to determine vAiether there appeared to be any 
interpretable ccnmon variance remaining after the model was fit to the data. 
Residuals were ccrputed as 



R. . = P. . - u. . , (13) 

ID ID ID 

where P is the probability of the observed response to item i by examinee j 

ij 

predicted from the MIET model, ard u^^ is the observed response. Residual 
correlation matrices were analyzed using principal ccarponents analysis, and 
resulting component loadings were examined. [Note that, since residuals are 
on a continuous scale, the problem of using principal ccstponents analysis with 
binary data is avoided using this procedure. ] 



Results 

Item Analyses 

Table 2 presents the means, standard deviations, ituniinums, and maximums 
of the examinee number-correct scores, item-total biserials, and item 
proportion-correct (p+-) scores for each dataset. The correlation between item 
biserials and p+- values was 0.59 for the unidimensional data and 0.65 for the 
three-dimensional data. The KR-20 coefficient of reliability was 0.94 for the 
unidimensional data, and 0.95 for the three-dimensional data. 
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Table 2 



Traditional Item Analysis Results 



Dataset/ Statistic 


Mecin 


S.D. 


Min. 


Max. 


1-Dimensional 


Number-Correct 


42.95 


15.38 


7.00 


77.00 


Item Biserial 


0.51 


0.15 


0.09 


0.77 


Item p+ 


0.54 


0.19 


0.17 


0.92 


3-Dimensional 


Number-Correct 


45.34 


16.52 


8.00 


80.00 


Item Biserial 


0.56 


0.11 


0.22 


0.73 


Item pt 


0.57 


0.18 


0.19 


0.92 



Itiese results indicate that the response data simulated according tc the 
CMERI model were fairly realistic. The means and standard deviations of the 
examinee number-correct scores suggest a test of e^rcpriate difficulty for 
the distxibution of ability used in this research. Moreover, item p+ and 
biserial values varied to a reasonable degree. The only departures from vhat 
might typically be obtained with real data are: 1) the mean biserials and the 
KR-20 are scitie&hat high, indicating the relative purity of the simulation 
data; and, 2) the correlation between the item p+ and biserial values was 
high (for ccatparison, values of this correlation ccsxputed on items pretested 
for the Test of English as a Foreign Language over the eight year period from 
1981 throui^ 1988 were 0.33, 0.29, and 0.44 for Sections 1, 2, and 3, 
respectively) . This, too, is probably a reflection of the purity of the data. 
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Principal Components Analyses 

Table 3 summarizes the results of the principal catrponents analyses 
performed cai the two datsets. Provided for each dataset are the first 10 
eigenvalues, along with the percent and camajlative percent of variance. 



Table 3 



Principal Coiponents Analysis Results 



Ccinponent 


1-Dimensional 


Data 


3* 


-Dimensional 


Data 




Eigen- 


Percent of 


Cumulative 


Eigen- 


Percent of 


Cumulative 




value 


Variance 


Percent 


value 


Variance 


Percent 


1 


14.91 


18.6 


18.6 


16.44 


20.6 


20 6 


2 


1.95 


2.4 


21.1 


6.51 


8.1 


28.7 


3 


1.33 


1.7 


22.7 


1.97 


2.5 


31.2 


4 


1.31 


1.6 


24.4 


1.54 


1.9 


33.1 


5 


1.27 


1.6 


26.0 


1.34 


1.7 


34.8 


6 


1.24 


1.6 


27.5 


1.18 


1.5 


36.2 


7 


1.22 


1.5 


29.1 


1.13 


1,4 


37.6 


8 


1.20 


1.5 


30.6 


1.11 


1.4 


39.0 


9 


1.18 


1.5 


32.1 


1.09 


1.4 


40.4 


10 


1.17 


1.5 


33.6 


1.08 


1.3 


41.7 



As has been pointed out by many researchers (see, for exaitple, Bock, 
Gibbons & Muraki, 1985; Carroll, 1945; Lord & Novick, 1968; Redkase, 1981; 
and. Tucker, Huirphreys, & Rozncwski, 1986) , a principal ccsrponents analysis of 
phi coefficients is fraui^t with dangers. Among these are the likelihood of 
obtaining spurious ccrrponents due to item difficulty and nonlinecirity. Ihe 
results reported above illustrate these problems. Althcu^ the first set of 
data were generated to have only one dimension, the principal corponents 
analysis suggests the presence of a second coiponent. Hcwever, further 
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examination reveals that the second caiponent is essentially a difficulty 
caiponent. Ihe correlation between the item loadings on the second cotponent 
and item preport ion-correct score was -0.86. In contrast, the correlation 
between the first ccirponent and the item prcportion-correct scores was 0.48, 
vAiich is about vAiat would be expected in li^t of the previously reported 
finding that the item prqportion-correct scores and item biserials had a 
correlation of 0.59 for these data. Ihe correlation between the first 
cdtponent and the item biserials was 0.98. 

A similar, thou^ more cotiplex, pattern emerged for the three-dimensional 
data. Ihe data were generated to have three ability dimensions one 
dimension which all items measured, one dimension only the first 40 items 
measured, and one dimension only items 41 through 80 measured. This resulted 
in a principal conponents solution in vMch there were two ccstponents — one 
cemmon conponent, on which all items had positive loadings, and one bipolar 
cenponent, on which the first 40 items all had positive loadings, and the last 
40 items all had negative loadings. However, the principal cemponents 
analysis results shown in Table 3 indicate the presence of a small third 
conponent. As was the case with the unidimensional date, this additional 
conponent is the result of using phi coefficients. Ihe correlation between 
the loadings on the third ccstponent and the item proportion-correct scores was 
-0.89. Ihe correlation of item proportion-correct scores and item loadings 
was 0.49 and -0.11 for the first and second conponents, respectively. As 
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repoirted cibove, the correlation between the item biserials and ite.n 
proporticxi— correct scores for these data was 0.65. The item biser ials had 
correlations of 0.95, —0.12 and —0.53 with the first, second, and third 
ccitponents, respectively. 

OCERF Analyses 

TTnidi-mpnsional data . Tab?o 4 presents summary statistics for the 
parameter estimates obtained for the unidimensional data. Shewn are the 
means, standard deviations, minimum, and maximum values for eaclri parameter 
estimated in each solution, along with the number of values estimated for each 
parameter. It should be noted that no attempt was made to place the estimates 
on the same scale as the true parameters. Nor have different sets of 
estimates been placed on the same scale. Consequently, the summary statistics 
shown in Table 4 should not be used to assess similarity of estimates to true 
parameters or other estimates. 

It can be seen frem Table 4 that the sunmary statistics for the a-values 
on the first dimension (the only dimension for the UXJ solution) were similar 
across solutions, although the mean was a little hi^er for the 2DU solution. 
Likewise, the ability estimate distributions for the first dimension and the 
b-values did not vary much across solutions. The seexind dimension summary 
statistics were also similar across solutions for both the a-values and 
ability estimates, and in both (cases the statistics were different fresn those 
obtained for the first dimension. Ihe ability estimates on the second 
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dimension has relatively small standard deviations, and the a-values on the 
ssecond dimension held lew means cettpared to the first dimension, suggesting 
the second estimated dimension may be nothing but noise. 



Table 4 

Parameter Estimate Distribution Summary Statistics 
for the Unidimensional Data 



Solution/Parameter 


N 


Mean 


Std. Dev, 


Min. 


Max. 


IDU 












a 


80 


0.77 


0.23 


0.37 


1.26 


b 


80 


-0.12 


0.95 


-3.41 


1.91 


0 


1000 


-0.08 


1.13 


-2.78 


2.99 


2DC 














80 


0.71 


0.22 


0.33 


1.20 


^2 


40 


0.30 


0.15 


0.10 


0.74 


b 


80 


-0.15 


0.92 


-2.65 


1.92 




1000 


-0.04 


1.18 


-2.77 


3.15 




1000 


-0.01 


0.54 


-1.54 


1.72 


2DU 














80 


0.85 


0.27 


0.34 


1.42 


^2 


80 


0.44 


0.20 


0.10 


0.86 


b 


80 


-0.08 


0.97 


-3.60 


1.98 


©1 


1000 


-0.11 


0.90 


-2.37 


2.58 


©2 


1000 


-0.06 


0.69 


-1.82 


1.97 



Table 5 shews the correlations of the true item parameters with the 
estimated item parameters for each solution for the unidimensional data. 

[Note that, since the e>-parameter wcis held fixed, it is not included in Table 
5.] It can be seen from the data shewn in Table 5 that, for each solution the 
a-parameter estima.tes for the first dimension were highly correlated with the 
true parameters. The meaning of the moderate correlations obtained between 
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the a— parameter esrtimates on the second dimension and the true a— values is 
unclear, and may be a result of overfitting. It is interesting to note that 
the correlation between the a-parameter estimates on the second dimension and 
the true a— values was higher for the 2DU solution than for the 2DC solution, 
and that the correlation between the a-parameter estimates on the first 
dimension and the true values was lower for the 2DU solution. It appears as 
thou^ increasing the number of parameters estimated on the second dimension 
produced a deterioration of the fitting of the first dimension. 



Table 5 



True and Estimated Item Parameter Correlations 
for the Uhidimensional Data 



Solution/Parameter 


N 


True 

a 


Parameter 

b 


UXJ 


a 


80 


0.89 


0.07 


b 


80 


0.05 


0.99 


2DC 


ai 


80 


0.91 


0.15 




40 


0.53 


-0.18 


b 


80 


0.03 


0.99 


2DU 




80 


0.81 


-0.04 


^2 


80 


0.67 


0.22 


b 


80 


0.07 


0.98 



The true and estimated ability parameter correlation was 0.96 for the IDU 
solution. For the 2DC solution, the correlation between the true ability 
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paraireter and the ability estimates on the first dimension was 0.96, and for 
the secoiva dimension it was 0.32. For the 2DU solution, the cxarrelation 
between the true abilities and the ability estimates was 0.92 for the first 
dimension and 0.65 for the sec»nd. 

Table 6 shows the chi-square, AIC, and CAIC values obtained for each 
solution for the unidimensional data. For these data, all three pair-wise 
corpar Isons could be tested for the significance of the differences in the 
associated chi-square values. All three chi-square differences were 
significant. 



Table 6 



Model Selection Criteria Values 
for The Unidimensional Data 





Qii 






Solution 


Square 


AIC 


CAIC 


IDU(true) 


70947.6 


85083.2 


86028.4 


2DC 


71102.7 


85318.2 


86499.8 


2DU 


70618.0 


84913.5 


86331.4 



As shown in Table 6, if the dii-square criterion is used, tlie 
unconstrained two-dimensional solution would be selected as cptimal, even 
though the data are actually unidimensional. Hie unidimensional solution 
would be chosen over the constrained two-:^imensional solution. Use of the AIC 
would result in the same ordering of solutions. Using the CAIC as a 
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car iter ion ; however, would result in selection of the unidimensional solution. 
The uTKxaTstrained two-dimensional solution would be selected over the 
constrained two-diii»ensional. 

Three-<i 3 Tr>prLqinn?>i data . Table 7 provides summary statistics for the 
parameter estimates obtained for the three-dimensional data. As was the case 
previously, no atteirpt at scaling these estimates has been made. 

The summary statistics for the a-parameter estimates on the first 
dimension were similar across solutions, although for the 2DU and 3DCb 
solutions the mean a-value tended to be a little Icwer than for the other 
solutions, and the mean a~value was a little hi^er for the 3DCa solution than 
for the others. There was very little variation across solutions in the 
ability estimate distributions for the first dimension, or for the b-values. 
For the two-diiivensional solutions, the a~values on the second dimension 
differed noriceably in mean value, with the mean being 0.3 hi<^.er for the 2DC 
solution, and the 2DC a-va]vie ;3 on the second dimension were less variable 
than for the 2DU solution. There was not a difference in the second dimension 
ability estimate di.stributions for these two solutions. 

For txie three-dimensional solutions, tiie a-values on the second and third 
dimensions had simxlcir means and standard deviations, and on both dimensions 
the mscins were higher for the 3DCa solution than for the 3EXi> solution. The 
ability estimate distributions were similar for the second and third 
dinvensions for botli three-dimensional solutions. 
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Table 7 



Parameter Estimate Distribution Summary Statistics 
for the Three-dimensional Data 



Solution/Parameter 


N 


Mean 


Std. Dev. 


Min. 


Max. 


IDU 


a 


80 


0.78 


0.19 


0.41 


1.26 


b 


80 


0.02 


0.85 


-2.54 


2.26 


e 


1000 


-0.05 


1.17 


-3.04 


3.31 


2DC 


^1 


80 


0.77 


0.23 


0.31 


1.25 


^2 


40 


0.92 


0.22 


0.53 


1.31 


b 


80 


-0.06 


1.02 


-3.37 


2.57 


®1 


1000 


0.08 


1.20 


-2.61 


3.06 


®2 


1000 


-0.07 


1.12 


-3.11 


3.48 


2DU 


^1 


80 


0.65 


0.40 


0.10 


1.40 


^2 


80 


0.62 


0.36 


0.10 


1.33 


b 


80 


0.10 


1.01 


-3.19 


2.68 




1000 


-0.09 


1.18 


-2.74 


2.75 


®2 


1000 


-0.09 


1.16 


-2.61 


3.06 


3DCa 


^1 


80 


0.81 


0.24 


0.41 


1.58 


^2 


40 


0.69 


0.19 


0.24 


1.04 


33 


40 


0.68 


0.17 


0.31 


1.10 


b 


80 


0.00 


1.03 


-3.18 


2.62 


®1 


1000 


-0.03 


1.03 


-2.51 


2.76 


®2 


1000 


-0.04 


0.96 


-2.72 


2.64 


03 


1000 


0.03 


1.00 


-2.89 


2.61 


3DCb 


^1 


80 


0.62 


0.32 


0.10 


1.32 


^2 


40 


0.55 


0.38 


0.10 


1.17 


33 


40 


0.58 


0.37 


0.10 


1.30 


b 


80 


-0.05 


0.99 


-3.36 


2.46 


®1 


1000 


0.01 


1.27 


-2.78 


2.83 


®2 


1000 


0.08 


1.04 


-2.10 


2.65 


03 


1000 


-0.01 


1.03 


-2.65 


2.45 
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Table 8 shows the intercorrelations for the true eind estimated item 
parameters for the three-dimensional data. Because the number of values in 
ccmmon to the true structure and the iitposed structures varies across 
dimensions and solutions, the number of items on vMch each correlation is 
ba Fv^d is shown in parentheses after each correlation. 

Ihe values shown in Table 8 indicate that the dimension estimated for the 
IDU solution was most strongly related to the first true dimension. For the 
2DC solution the two estimated dimensions appeared to be equally strongly 
j^elated to the first true dimension. The first estimated dimension appeared 
to be i ght ly more strongly related to the first true dimension, while the 
second estimated dimension appeared to be more strongly related to the second 
true dimension. 

For the 2DU solution, the first estimated dimension appeared to be 
related to the second true dimension, vhile the second estimated dimension was 
related to the third true dimension. Neither dimension appeared to be 
strongly related to the first true dimension. 

For the 3EX2A solution, the first estimated dimension appeared to 
correspond to the first true dimension, while the second and third estimated 
dimensions corresponded to the second and third true dimensions, respectively. 
For the 3DCb solution, the first estimated dimension was most strongly related 
to the second true dimension, while the second and third estimated dimensions 
both appeared most strongly related to the third true dimension. 
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Table 8 

True cind Estimated Item Parameter Correlations 
for the Ihree-Dimensional Data 



Solution/Pcurameter 


True Pcirameter 






ai(N) 


a2(N) 


33 (N) 


b(N) 


IDU 


a 


0.76(80) 


0.44(40) 


-0.10(40) 


.17(80) 


b 


-0.06(80) 


0.17(40) 


0.25(40) 


J.99(80) 


2DC 


^1 


0.57(80) 


0.42(40) 


0.40(40) 


0.00(80) 


^2 


0.58^0) 


0.80(40) 


0.00( 0) 


0.16(40) 


b 


-0.10(80) 


0.16(40) 


0.23(40) 


0.99(80) 


2DU 


^1 


0.37(80) 


0.70(40) 


-0.40(40) 


0.11(80) 


^2 


0.24(80) 


0.17(40) 


0.67(40) 


-0.07(80) 


b 


-0.08(80) 


0.17(40) 


0.24(40) 


0.99(80) 


3DCa 


^1 


0.86(80) 


0.31(40) 


-0.17(40) 


0.02(80) 


^2 


0.19(40) 


0.86(40) 


0.00( 0) 


0.15(40) 


^3 


-0.09(40) 


0.00( 0) 


0.85(40) 


0.11(40) 


b 


-0.09(80) 


0.17(40) 


0.24(40) 


0.99(80) 


3DCb 


^1 


0.40(80) 


0.68(40) 


-0.29(40) 


0.09(80) 


^2 


0.26(40) 


0.10(20) 


0.52(20) 


-0.13(40) 


^3 


0.07(40) 


-0.02(20) 


0.69(20) 


-0.01(40) 


b 


-0.10(80) 


0.15(40) 


0.25(40) 


0.99(80) 



The true and estimated ability parameter intercorrelations are shewn in 
Table 9. These data indicate that, for the IDU solution, the estimated 
abilities were most similar to the first dimension tnae abilities. For the 
2DC solution, the first dimension ability estimates were most hi^ily 
correlated with the first dimension true abilities, vhile the second dimension 
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ability estimates were most strongly related to the second dimension true 
abilities. Ihe first dimension ability estimates were also fairly strongly 
related to the third dimension true abilities. 

For the 2DU solution, the ability estimates on the first dimension were 
most strongly related to the second dimension true abilities, and the second 
dimension ability estimates were most strongly related to the first dimension 
true abilities. Hie first and second dimension ability estimates were egually 
strongly related to the first dimension true abilities. 

For the 3DCa solution the first dimension estimates were most strongly 
related to the first dimension true estimates, the second dimension ability 
estimates were most strongly related to the second dimension true abilities, 
and the third dimension estimates were most strongly related to the third 
dimension true abilities. For the 31Xb solution, the first dimension ability 
estimates were strongly related to both the first and second dimension true 
abilities, thou^ the correlation was sli(^tly higher for the second 
dimension. Hie second and third dimension estimates were both most strongly 
related to the third dimension true abilities. 

Table 10 shows the chi-square, AIC, and CAIC values obtained for each 
solution for the three-dimensional data. For these data not all chi-square 
differences could be tested for significance. Table 11 summarizes which pairs 
of chi-squares could be tested. All testable pairs were significantly 
different. 
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l^le 9 

True and Estimated Ability Parameter Correlations 
for the Three-Dimensional Data 



SoluticJi/'Dimension True Ability Dimension 





1 


2 


3 


IDU 


1 


0.80 


0.42 


0.33 


2DC 


1 


0.73 


0.03 


0.62 


2 


0.32 


0.78 


-0.40 


2DU 


1 


0.57 


0.74 


-0.19 


2 


0.58 


-0.20 


0.72 


3DCa 


1 


0.87 


0.29 


0.21 


2 


0.19 


0.85 


-0.23 


3 


0.24 


-0.27 


0.84 


3DCb 


1 


0.64 


0.70 


-0.09 


2 


0.54 


-0.25 


0.70 


3 


0.49 


-0.25 


0.72 



Like the unidimensional case, the chi-square and AIC would result in the 
same ordering of the models for the three-dimerisional data. Using either the 
absolute magnitude of the chi-square criterion or the AIC, the 3DCa solution 
would have been selected cis best. The unconstrained two-dimensional solution 
was next, vhile the constrained two-dimensional solution was third. The 3DCb 
solution was fourth, and the unidimensional solution was last. Of course, 
since not all of the pair-wise ccnparisons are testable, this rank-ordering 
isn't entirely objective. 
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Using the CAIC resulted in a sli(^tly different ordering of models. Ihe 
3DCa solution was first, as it was using the chi-square and AIC. Hcwever, 
using the CAIC, the constrained two-diinensional solution was second. Ihis 
ordering is more reasonable than that obtained using the chi— square and AIC, 
since the constrained two-dimensional solution used an item discrimination 
parameter that was consistent with the true pattern. Ihe unconstrained 
two-dimensional solution was third, the 3DCb solution was fourth, and the 
unidiroensional solution was last. 



Table 10 



Model Selection Criteria Values 
for Ihe Three-Dimensional Data 



Solution 


Qii-Square 


AIC 


CAIC 


IDU 


70706.4 


84823.8 


85769.0 


2DC 


65828.9 


80022.6 


81204.1 


2DU 


65592.6 


79871.3 


81289.2 


3DCa(true) 


65450.4 


79723.6 


81141.5 


31Xb 


67548.8 


81815.3 


83233.2 



Table 11 

Testable Pair-Wise Chi-Square Canparisons 
for The Three-Dimensional Data 


Solution IDU 2DC 2DU 


3DCa 


3DCb 


irxj _ * * 


* 


* 


2DC _ * 


* 


— 


2DU 




— 


3DCa(true) 







Note . Dash (-) indicates not testable, eisterisk (*) 
indicates test^le. 
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Analysis of Residuals 

Table 12 sunmarizes the results of the residual analyses for both the 
unidimensional and three-dimensional data. Shewn are the first three 
eigenvalues obtained from a principal cenponents analysis of Pearson product 
moment correlations ccmpited on the matrix of residuals for each CMIBT 
solution. Also shewn is the percent of variance and cumulative percent of 
variance corresponding to each eigenvalue. 

For the unidimensional data, the results reported in Table 12 indicate 
no meaningful variation remaining in the residuals. This is consistent with 
the fact that the data were truly unidimensional . It is interesting to note 
that increasing the number of parameters estimated did not reduce the size of 
the first eigenvalue of the residuals to any meaningful degree. 

For the three-dimensional data, the pattern is quite different. For 
these data, increcising the number of estimated parameters noticeably reduced 
the size of the first eigenvalue, and correctly clustering the items in the 
3DCa solution reduced the first eigenvalue to a smaller value than was 
obtained for the 2DU solution, even thou^ the number of item parameters 
estimated did not increase. Incorrectly clustering the items in the 3DCb 
solution, on the other hand, did not produce a smaller first eigenvalue than 
was obtained for the 2DU solution. 
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Table 12 



Principal Conponents Analysis of Residuals 



Dataset/ 

Ccnponent/ 

Statistic 






Solution 






IDU 


2DC 


2DU 


3DCa 


3DCb 


Unidimensional 












X 

Eigenvalue 


1.62 


1.62 


1.61 






% Vciriance 


2.02 


2.02 


2.02 






Cuinulative % 


2.02 


2.02 


2.02 






2 

Eigenvalue 


1.56 


1.56 


1.57 






% Variance 


1.97 


1.95 


1.96 






CXnnulative % 


3.99 


3.97 


3.98 






3 

Eigenvalue 


1.54 


1.55 


1.55 






% Variance 


1.92 


1.94 


1.94 






Cumulative % 


5.91 


5.91 


5.92 






Three-dimensional 












X 

Eigenvalue 


8.37 


2.07 


1.95 


1.63 


1.93 


% Variance 


10.46 


2.58 


2.44 


2-04 


2.42 


Cumulative % 


10.46 


2.58 


2.44 


2.04 


2.42 


2 

Eigenvalue 


1.96 


1.61 


1.61 


1.61 


1.63 


% Vciriance 


2.45 


2.01 


2.02 


2.01 


2.03 


Cumulative % 


12.91 


4.60 


4.45 


4.05 


4.45 


3 

Eigenvalue 


1.72 


1.57 


1.57 


1.58 


1.58 


% Vciriance 


2.15 


1.96 


1.97 


1.98 


1.98 


Cumulative % 


15.07 


6.56 


6.42 


6.02 


6.43 
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Sungnary ard Conclusions 

Ihe purpose of this research was to develop and evaluate a confinnatory 
approach to assessing test structure using multidimensional item response 
theory. The approach investigated involves adding to the exponent of the MZRT 
model an item structure matrix that allows the user to specify what ability 
dimensions are measured by an item. Various ccmhinations of item structures 
were fit to two sets of simulation data with krvcwn true structures, and the 
results were evaluated using three different model selection criteria and an 
analysis of residuals procedure. In addition, item and principal conponents 
analyses were performed to assess the reasonableness of the data. 

The results of the item and principal cccponents analyses tend to support 
the recisonableness of the CMIPT model. The data generated according to both 
the unidimensional and three-dimensional models afpeared to be realistic with 
respect to item difficulty and discrimination, and the structure of each test, 
as revealed by the principal ccatponents analysis, wais neither unrealistic nor 
uncommon. Ihe reliabilities of the tests did appear to be a little higher 
than normally obtained with real data, as did the correlations between the 
item biserials and difficulties, but these results were most likely a 
reflection of the purity of the simulated data. 

The cotiparisons among the various solutions derived for each set of data 
using the three model selection criteria were encouraging. The likelihood 
ratio chi-square statistic was clearly inadequate, since its significar.ee 
ccxild not always be tested, cind both the chi-squcire and AIC statistics tended 
to result in over-parameterization. However, the CAIC criterion appeared to 
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function quite well. For both the \ariidiinensional and three-dimensional data, 
the CAIC criterion resulted in selection of the true structure. 

In addition to finding that the procedures caold recover the true item 
structures, it was also found that adding an additicrial ability dijnension that 
forces together itens that ou^t not to be together (the 3DCb solution) 
noticeably deteriorates the quality of the solution. On the other hand, 
irtposing structures different frcia, but not inconsistent with, the true 
structure (the 2D solutions) does not necessarily yield worse fit. 

The residual analyses indicated that, for the unidimensional data, adding 
additional dimensions did not reduce the proportion of canmon variance 
remaining in the residuals below what was obtained for the unidimensional 
solution. For the three-dimensional data, however, adding dimensions did 
reduce the remaining canmon variance below what was obtained for the 
unidimensional solution, and correctly clustering items reduced the remaining 
cctimon variance below what was obtained, when items were incorrectly clustered, 
even when the number of dimensions (or parameters) did not increase. 
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